Microsoft COCO Captions: Data Collection and Evaluation Server

نویسندگان

Xinlei Chen

Hao Fang

Tsung-Yi Lin

Ramakrishna Vedantam

Saurabh Gupta

Piotr Dollár

C. Lawrence Zitnick

چکیده

In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is used. The evaluation server receives candidate captions and scores them using several popular metrics, including BLEU, METEOR, ROUGE and CIDEr. Instructions for using the evaluation server are provided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring Nearest Neighbor Approaches for Image Captioning

We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the “consensus” of the set of candidate captions gathered from the nearest neighbor images. When ...

متن کامل

ChatPainter: Improving Text to Image Generation using Dialogue

Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can contain several objects, is a challenging task. Prior work has used text captions to generate images. However, captions might not be informative enough to capture the entire image and insufficient for the model to be able to understand which objects in the i...

متن کامل

Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images, which is a much more abundant commodity. We here propose a novel way of using such textual data by artificially generating missing visual information. We eva...

متن کامل

Deep CNN Ensemble with Data Augmentation for Object Detection

We report on the methods used in our recent DeepEnsembleCoco submission to the PASCAL VOC 2012 challenge, which achieves state-of-theart performance on the object detection task. Our method is a variant of the R-CNN model proposed by Girshick et al. [4] with two key improvements to training and evaluation. First, our method constructs an ensemble of deep CNN models with different architectures ...

متن کامل

Generating Images from Captions with Attention

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1504.00325 شماره

صفحات -

تاریخ انتشار 2015

Microsoft COCO Captions: Data Collection and Evaluation Server

نویسندگان

چکیده

منابع مشابه

Exploring Nearest Neighbor Approaches for Image Captioning

ChatPainter: Improving Text to Image Generation using Dialogue

Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

Deep CNN Ensemble with Data Augmentation for Object Detection

Generating Images from Captions with Attention

عنوان ژورنال:

اشتراک گذاری